Goto

Collaborating Authors

 calibration error








Optimal Lower Bounds for Online Multicalibration

Collina, Natalie, Lu, Jiuyao, Noarov, Georgy, Roth, Aaron

arXiv.org Machine Learning

We prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend on both context and the learner's predictions, we prove an $Ω(T^{2/3})$ lower bound on expected multicalibration error using just three disjoint binary groups. This matches the upper bounds of Noarov et al. (2025) up to logarithmic factors and exceeds the $O(T^{2/3-\varepsilon})$ upper bound for marginal calibration (Dagan et al., 2025), thereby separating the two problems. We then turn to lower bounds for the more difficult case of group functions that may depend on context but not on the learner's predictions. In this case, we establish an $\widetildeΩ(T^{2/3})$ lower bound for online multicalibration via a $Θ(T)$-sized group family constructed using orthogonal function systems, again matching upper bounds up to logarithmic factors.


Calibrated Multi-Level Quantile Forecasting

Ding, Tiffany, Gibbs, Isaac, Tibshirani, Ryan J.

arXiv.org Machine Learning

We present an online method for guaranteeing calibration of quantile forecasts at multiple quantile levels simultaneously. A sequence of $α$-level quantile forecasts is calibrated if the forecasts are larger than the target value at an $α$-fraction of time steps. We introduce a lightweight method called Multi-Level Quantile Tracker (MultiQT) that wraps around any existing point or quantile forecaster to produce corrected forecasts guaranteed to achieve calibration, even against adversarial distribution shifts, while ensuring that the forecasts are ordered -- e.g., the 0.5-level quantile forecast is never larger than the 0.6-level forecast. Furthermore, the method comes with a no-regret guarantee that implies it will not worsen the performance of an existing forecaster, asymptotically, with respect to the quantile loss. In experiments, we find that MultiQT significantly improves the calibration of real forecasters in epidemic and energy forecasting problems.


Appendix A Broader Impact

Neural Information Processing Systems

Overconfidence in deep neural networks could easily lead to deployments where predictions are made that should have been withheld. For validation set, on the other hand, we care about the confidence of the "top predicted class". Independent binning: when training samples and validation samples are grouped independently into their respective training-bins and validation-bins (Figure 1). The binning is adaptive with 15 equal-mass bins. Figure 10: Common binning: training samples are grouped using the bin boundaries of the validation-bins.


A PID Controller Approach for Adaptive Probability-dependent Gradient Decay in Model Calibration

Neural Information Processing Systems

During model optimization, the expected calibration error tends to overfit earlier than classification accuracy, indicating distinct optimization objectives for classification error and calibration error. To ensure consistent optimization of both model accuracy and model calibration, we propose a novel method incorporating a probability-dependent gradient decay coefficient into loss function. This coefficient exhibits a strong correlation with the overall confidence level.